Open In Colab

Annolid on Detectron2 Tutorial

Welcome to Annolid on detectron2! This is modified from the official colab tutorial of detectron2. Here, we will go through some basics usage of detectron2, including the following:

  • Run inference on images or videos, with an existing detectron2 model

  • Train a detectron2 model on a new dataset

You can make a copy of this tutorial by “File -> Open in playground mode” and play with it yourself. DO NOT request access to this tutorial.

Install detectron2

# Is running in colab or in jupyter-notebook
try:
  import google.colab
  IN_COLAB = True
except:
  IN_COLAB = False
# install dependencies: 
!pip install pyyaml==5.3
import torch, torchvision
TORCH_VERSION = ".".join(torch.__version__.split(".")[:2])
CUDA_VERSION = torch.__version__.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)
# Install detectron2 that matches the above pytorch version
# See https://detectron2.readthedocs.io/tutorials/install.html for instructions
!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/$CUDA_VERSION/torch$TORCH_VERSION/index.html
# If there is not yet a detectron2 release that matches the given torch + CUDA version, you need to install a different pytorch.

# exit(0)  # After installation, you may need to "restart runtime" in Colab. This line can also restart runtime
WARNING: Ignoring invalid distribution -etectron2 (/home/jeremy/anaconda3/lib/python3.7/site-packages)
WARNING: Ignoring invalid distribution -etectron2 (/home/jeremy/anaconda3/lib/python3.7/site-packages)
Requirement already satisfied: pyyaml==5.3 in /home/jeremy/anaconda3/lib/python3.7/site-packages (5.3)
WARNING: Ignoring invalid distribution -etectron2 (/home/jeremy/anaconda3/lib/python3.7/site-packages)
WARNING: Ignoring invalid distribution -etectron2 (/home/jeremy/anaconda3/lib/python3.7/site-packages)
WARNING: Ignoring invalid distribution -etectron2 (/home/jeremy/anaconda3/lib/python3.7/site-packages)
torch:  1.10 ; cuda:  cu102
WARNING: Ignoring invalid distribution -etectron2 (/home/jeremy/anaconda3/lib/python3.7/site-packages)
WARNING: Ignoring invalid distribution -etectron2 (/home/jeremy/anaconda3/lib/python3.7/site-packages)
Looking in links: https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.10/index.html
Requirement already satisfied: detectron2 in /mnt/home_nas/jeremy/Recherches/Postdoc/CPLab/Projects/Annolid/annolid/detectron2 (0.6)
Requirement already satisfied: Pillow>=7.1 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (8.2.0)
Requirement already satisfied: matplotlib in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (3.2.2)
Requirement already satisfied: pycocotools>=2.0.2 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (2.0.2)
Requirement already satisfied: termcolor>=1.1 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (1.1.0)
Requirement already satisfied: yacs>=0.1.8 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (0.1.8)
Requirement already satisfied: tabulate in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (0.8.9)
Requirement already satisfied: cloudpickle in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (1.6.0)
Requirement already satisfied: tqdm>4.29.0 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (4.56.0)
Requirement already satisfied: tensorboard in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (2.7.0)
Requirement already satisfied: fvcore<0.1.6,>=0.1.5 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (0.1.5.post20211023)
Requirement already satisfied: iopath<0.1.10,>=0.1.7 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (0.1.9)
Requirement already satisfied: future in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (0.18.2)
Requirement already satisfied: pydot in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (1.4.2)
Requirement already satisfied: omegaconf>=2.1 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (2.1.1)
Requirement already satisfied: hydra-core>=1.1 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (1.1.1)
Requirement already satisfied: black==21.4b2 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from detectron2) (21.4b2)
Requirement already satisfied: typed-ast>=1.4.2 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from black==21.4b2->detectron2) (1.4.3)
Requirement already satisfied: toml>=0.10.1 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from black==21.4b2->detectron2) (0.10.2)
Requirement already satisfied: regex>=2020.1.8 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from black==21.4b2->detectron2) (2021.4.4)
Requirement already satisfied: mypy-extensions>=0.4.3 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from black==21.4b2->detectron2) (0.4.3)
Requirement already satisfied: typing-extensions>=3.7.4 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from black==21.4b2->detectron2) (3.7.4.3)
Requirement already satisfied: click>=7.1.2 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from black==21.4b2->detectron2) (7.1.2)
Requirement already satisfied: pathspec<1,>=0.8.1 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from black==21.4b2->detectron2) (0.8.1)
Requirement already satisfied: appdirs in /home/jeremy/anaconda3/lib/python3.7/site-packages (from black==21.4b2->detectron2) (1.4.4)
Requirement already satisfied: pyyaml>=5.1 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from fvcore<0.1.6,>=0.1.5->detectron2) (5.3)
Requirement already satisfied: numpy in /home/jeremy/anaconda3/lib/python3.7/site-packages (from fvcore<0.1.6,>=0.1.5->detectron2) (1.20.1)
Requirement already satisfied: antlr4-python3-runtime==4.8 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from hydra-core>=1.1->detectron2) (4.8)
Requirement already satisfied: importlib-resources in /home/jeremy/anaconda3/lib/python3.7/site-packages (from hydra-core>=1.1->detectron2) (5.1.0)
Requirement already satisfied: portalocker in /home/jeremy/anaconda3/lib/python3.7/site-packages (from iopath<0.1.10,>=0.1.7->detectron2) (2.3.2)
Requirement already satisfied: setuptools>=18.0 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from pycocotools>=2.0.2->detectron2) (52.0.0.post20210125)
Requirement already satisfied: cython>=0.27.3 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from pycocotools>=2.0.2->detectron2) (0.29.23)
Requirement already satisfied: cycler>=0.10 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from matplotlib->detectron2) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from matplotlib->detectron2) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from matplotlib->detectron2) (1.3.1)
Requirement already satisfied: python-dateutil>=2.1 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from matplotlib->detectron2) (2.8.1)
Requirement already satisfied: google-auth<3,>=1.6.3 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from tensorboard->detectron2) (1.24.0)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from tensorboard->detectron2) (0.4.6)
Requirement already satisfied: werkzeug>=0.11.15 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from tensorboard->detectron2) (2.0.1)
Requirement already satisfied: grpcio>=1.24.3 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from tensorboard->detectron2) (1.42.0)
Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from tensorboard->detectron2) (0.6.1)
Requirement already satisfied: absl-py>=0.4 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from tensorboard->detectron2) (0.13.0)
Requirement already satisfied: protobuf>=3.6.0 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from tensorboard->detectron2) (3.19.3)
Requirement already satisfied: wheel>=0.26 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from tensorboard->detectron2) (0.36.2)
Requirement already satisfied: markdown>=2.6.8 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from tensorboard->detectron2) (3.3.4)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from tensorboard->detectron2) (1.8.0)
Requirement already satisfied: requests<3,>=2.21.0 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from tensorboard->detectron2) (2.25.1)
Requirement already satisfied: six in /home/jeremy/anaconda3/lib/python3.7/site-packages (from absl-py>=0.4->tensorboard->detectron2) (1.15.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from google-auth<3,>=1.6.3->tensorboard->detectron2) (4.7.2)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from google-auth<3,>=1.6.3->tensorboard->detectron2) (4.2.2)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from google-auth<3,>=1.6.3->tensorboard->detectron2) (0.2.8)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard->detectron2) (1.3.0)
Requirement already satisfied: importlib-metadata in /home/jeremy/anaconda3/lib/python3.7/site-packages (from markdown>=2.6.8->tensorboard->detectron2) (3.10.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2) (1.26.4)
Requirement already satisfied: idna<3,>=2.5 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2) (2021.10.8)
Requirement already satisfied: chardet<5,>=3.0.2 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2) (4.0.0)
Requirement already satisfied: zipp>=0.4 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from importlib-resources->hydra-core>=1.1->detectron2) (3.4.1)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard->detectron2) (0.4.8)
Requirement already satisfied: oauthlib>=3.0.0 in /home/jeremy/anaconda3/lib/python3.7/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard->detectron2) (3.1.0)
WARNING: Ignoring invalid distribution -etectron2 (/home/jeremy/anaconda3/lib/python3.7/site-packages)
WARNING: Ignoring invalid distribution -etectron2 (/home/jeremy/anaconda3/lib/python3.7/site-packages)
WARNING: Ignoring invalid distribution -etectron2 (/home/jeremy/anaconda3/lib/python3.7/site-packages)
# import some common libraries
import json
import os
import cv2
import random
import glob
import numpy as np
if IN_COLAB:
  from google.colab.patches import cv2_imshow
import matplotlib.pyplot as plt
%matplotlib inline
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
# is there a gpu
if torch.cuda.is_available():
    GPU = True
    print('gpu available')
else:
    GPU = False
    print('no gpu')
no gpu

Upload a labeled dataset.

The following code is expecting the dataset in the COCO format to be in a .zip file. For example: sample_dataset.zip
Note: please make sure the is no white space in your file path if you encounter file not found issues.

if IN_COLAB:
    from google.colab import files
else:
    from ipywidgets import FileUpload
    from IPython.display import display
    !jupyter nbextension enable --py widgetsnbextension
Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: OK
if IN_COLAB:
    uploaded = files.upload()
else:
    uploaded = FileUpload()

Running the following cell should enable a clickable button to upload the .zip file. If no button appears, you might need to update / install nodeJS.

display(uploaded)
if IN_COLAB:
    dataset =  list(uploaded.keys())[0]
else:
    dataset = list(uploaded.value.keys())[0]
if IN_COLAB:
    !unzip $dataset -d /content/
else:
    #TODO generalize this
    !unzip -o ../../sample_dataset/$dataset -d .
Archive:  ../../sample_dataset/sample_dataset_coco_dataset.zip
  inflating: ./sample_dataset_coco_dataset/data.yaml  
  inflating: ./sample_dataset_coco_dataset/train/JPEGImages/sample_video_00000001_804604.jpg  
  inflating: ./sample_dataset_coco_dataset/train/JPEGImages/sample_video_00000007_309927.jpg  
  inflating: ./sample_dataset_coco_dataset/train/JPEGImages/sample_video_00000008_187470.jpg  
  inflating: ./sample_dataset_coco_dataset/train/JPEGImages/sample_video_00000009_225348.jpg  
  inflating: ./sample_dataset_coco_dataset/train/JPEGImages/sample_video_00000010_267263.jpg  
  inflating: ./sample_dataset_coco_dataset/train/JPEGImages/sample_video_000000315.jpg  
  inflating: ./sample_dataset_coco_dataset/train/JPEGImages/sample_video_00000525_203264.jpg  
  inflating: ./sample_dataset_coco_dataset/train/JPEGImages/sample_video_00000585_472082.jpg  
  inflating: ./sample_dataset_coco_dataset/train/JPEGImages/sample_video_00000870_465838.jpg  
  inflating: ./sample_dataset_coco_dataset/train/JPEGImages/sample_video_00001095_243935.jpg  
  inflating: ./sample_dataset_coco_dataset/train/JPEGImages/sample_video_00001110_212911.jpg  
  inflating: ./sample_dataset_coco_dataset/train/annotations.json  
  inflating: ./sample_dataset_coco_dataset/valid/JPEGImages/sample_video_00000002_167831.jpg  
  inflating: ./sample_dataset_coco_dataset/valid/JPEGImages/sample_video_00000030_212561.jpg  
  inflating: ./sample_dataset_coco_dataset/valid/JPEGImages/sample_video_00000135_201080.jpg  
  inflating: ./sample_dataset_coco_dataset/valid/JPEGImages/sample_video_00000405_247001.jpg  
  inflating: ./sample_dataset_coco_dataset/valid/JPEGImages/sample_video_00000630_460792.jpg  
  inflating: ./sample_dataset_coco_dataset/valid/JPEGImages/sample_video_00000826_202697.jpg  
  inflating: ./sample_dataset_coco_dataset/valid/JPEGImages/sample_video_00001050_552184.jpg  
  inflating: ./sample_dataset_coco_dataset/valid/annotations.json  

If your dataset has the same name as the file you uploaded, you do not need to manually input the name (just run the next cells). Otherwise, you need to replace DATASET_NAME and DATASET_DIR with your own strings like DATASET_NAME = "NameOfMyDataset" and DATASETDIR="NameOfMyDatasetDirectory". To do that, uncomment the commented out cell below and replace the strings with the appropriate names

DATASET_NAME = DATASET_DIR = f"{dataset.replace('.zip','')}"
# DATASET_NAME = 'NameOfMyDataset' 
# DATASET_DIR = 'NameOfMyDatasetDirectory'
DATASET_NAME
'sample_dataset_coco_dataset'
DATASET_DIR
'sample_dataset_coco_dataset'

Run a pre-trained detectron2 model

First, we check a random selected image from our training dataset:

# select and display one random image from the training set
img_file = random.choice(glob.glob(f"{DATASET_DIR}/train/JPEGImages/*.*"))
im = cv2.imread(img_file)
if IN_COLAB:
    cv2_imshow(im)
else:
    plt.imshow(im)
../_images/Annolid_on_Detectron2_Tutorial_22_0.png

Then, we create a Detectron2 config and a detectron2 DefaultPredictor to run inference on this image.

cfg = get_cfg()
if GPU:
    pass
else:
    cfg.MODEL.DEVICE='cpu'
# add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.1  # set threshold for this model
# Find a model from Detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(im)
/home/jeremy/anaconda3/envs/annolid/lib/python3.7/site-packages/detectron2/structures/image_list.py:88: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  max_size = (max_size + (stride - 1)) // stride * stride
/home/jeremy/anaconda3/envs/annolid/lib/python3.7/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
# look at the outputs. See https://detectron2.readthedocs.io/tutorials/models.html#model-output-format for specification
print(outputs["instances"].pred_classes)
print(outputs["instances"].pred_boxes)
tensor([32, 21])
Boxes(tensor([[ 84.5660, 335.6721, 176.6287, 424.9145],
        [151.0393,  89.9211, 501.5596, 355.7820]]))
outputs['instances'].pred_masks
tensor([[[False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         ...,
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False]],

        [[False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         ...,
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False]]])
MetadataCatalog.get(cfg.DATASETS.TRAIN[0])
namespace(name='coco_2017_train',
          json_file='datasets/coco/annotations/instances_train2017.json',
          image_root='datasets/coco/train2017',
          evaluator_type='coco',
          thing_dataset_id_to_contiguous_id={1: 0,
                                             2: 1,
                                             3: 2,
                                             4: 3,
                                             5: 4,
                                             6: 5,
                                             7: 6,
                                             8: 7,
                                             9: 8,
                                             10: 9,
                                             11: 10,
                                             13: 11,
                                             14: 12,
                                             15: 13,
                                             16: 14,
                                             17: 15,
                                             18: 16,
                                             19: 17,
                                             20: 18,
                                             21: 19,
                                             22: 20,
                                             23: 21,
                                             24: 22,
                                             25: 23,
                                             27: 24,
                                             28: 25,
                                             31: 26,
                                             32: 27,
                                             33: 28,
                                             34: 29,
                                             35: 30,
                                             36: 31,
                                             37: 32,
                                             38: 33,
                                             39: 34,
                                             40: 35,
                                             41: 36,
                                             42: 37,
                                             43: 38,
                                             44: 39,
                                             46: 40,
                                             47: 41,
                                             48: 42,
                                             49: 43,
                                             50: 44,
                                             51: 45,
                                             52: 46,
                                             53: 47,
                                             54: 48,
                                             55: 49,
                                             56: 50,
                                             57: 51,
                                             58: 52,
                                             59: 53,
                                             60: 54,
                                             61: 55,
                                             62: 56,
                                             63: 57,
                                             64: 58,
                                             65: 59,
                                             67: 60,
                                             70: 61,
                                             72: 62,
                                             73: 63,
                                             74: 64,
                                             75: 65,
                                             76: 66,
                                             77: 67,
                                             78: 68,
                                             79: 69,
                                             80: 70,
                                             81: 71,
                                             82: 72,
                                             84: 73,
                                             85: 74,
                                             86: 75,
                                             87: 76,
                                             88: 77,
                                             89: 78,
                                             90: 79},
          thing_classes=['person',
                         'bicycle',
                         'car',
                         'motorcycle',
                         'airplane',
                         'bus',
                         'train',
                         'truck',
                         'boat',
                         'traffic light',
                         'fire hydrant',
                         'stop sign',
                         'parking meter',
                         'bench',
                         'bird',
                         'cat',
                         'dog',
                         'horse',
                         'sheep',
                         'cow',
                         'elephant',
                         'bear',
                         'zebra',
                         'giraffe',
                         'backpack',
                         'umbrella',
                         'handbag',
                         'tie',
                         'suitcase',
                         'frisbee',
                         'skis',
                         'snowboard',
                         'sports ball',
                         'kite',
                         'baseball bat',
                         'baseball glove',
                         'skateboard',
                         'surfboard',
                         'tennis racket',
                         'bottle',
                         'wine glass',
                         'cup',
                         'fork',
                         'knife',
                         'spoon',
                         'bowl',
                         'banana',
                         'apple',
                         'sandwich',
                         'orange',
                         'broccoli',
                         'carrot',
                         'hot dog',
                         'pizza',
                         'donut',
                         'cake',
                         'chair',
                         'couch',
                         'potted plant',
                         'bed',
                         'dining table',
                         'toilet',
                         'tv',
                         'laptop',
                         'mouse',
                         'remote',
                         'keyboard',
                         'cell phone',
                         'microwave',
                         'oven',
                         'toaster',
                         'sink',
                         'refrigerator',
                         'book',
                         'clock',
                         'vase',
                         'scissors',
                         'teddy bear',
                         'hair drier',
                         'toothbrush'],
          thing_colors=[[220, 20, 60],
                        [119, 11, 32],
                        [0, 0, 142],
                        [0, 0, 230],
                        [106, 0, 228],
                        [0, 60, 100],
                        [0, 80, 100],
                        [0, 0, 70],
                        [0, 0, 192],
                        [250, 170, 30],
                        [100, 170, 30],
                        [220, 220, 0],
                        [175, 116, 175],
                        [250, 0, 30],
                        [165, 42, 42],
                        [255, 77, 255],
                        [0, 226, 252],
                        [182, 182, 255],
                        [0, 82, 0],
                        [120, 166, 157],
                        [110, 76, 0],
                        [174, 57, 255],
                        [199, 100, 0],
                        [72, 0, 118],
                        [255, 179, 240],
                        [0, 125, 92],
                        [209, 0, 151],
                        [188, 208, 182],
                        [0, 220, 176],
                        [255, 99, 164],
                        [92, 0, 73],
                        [133, 129, 255],
                        [78, 180, 255],
                        [0, 228, 0],
                        [174, 255, 243],
                        [45, 89, 255],
                        [134, 134, 103],
                        [145, 148, 174],
                        [255, 208, 186],
                        [197, 226, 255],
                        [171, 134, 1],
                        [109, 63, 54],
                        [207, 138, 255],
                        [151, 0, 95],
                        [9, 80, 61],
                        [84, 105, 51],
                        [74, 65, 105],
                        [166, 196, 102],
                        [208, 195, 210],
                        [255, 109, 65],
                        [0, 143, 149],
                        [179, 0, 194],
                        [209, 99, 106],
                        [5, 121, 0],
                        [227, 255, 205],
                        [147, 186, 208],
                        [153, 69, 1],
                        [3, 95, 161],
                        [163, 255, 0],
                        [119, 0, 170],
                        [0, 182, 199],
                        [0, 165, 120],
                        [183, 130, 88],
                        [95, 32, 0],
                        [130, 114, 135],
                        [110, 129, 133],
                        [166, 74, 118],
                        [219, 142, 185],
                        [79, 210, 114],
                        [178, 90, 62],
                        [65, 70, 15],
                        [127, 167, 115],
                        [59, 105, 106],
                        [142, 108, 45],
                        [196, 172, 0],
                        [95, 54, 80],
                        [128, 76, 255],
                        [201, 57, 1],
                        [246, 0, 122],
                        [191, 162, 208]])
# We can use `Visualizer` to draw the predictions on the image.
v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
if IN_COLAB:
    cv2_imshow(out.get_image()[:, :, ::-1])
else:
    plt.imshow(out.get_image()[:, :, ::-1])
../_images/Annolid_on_Detectron2_Tutorial_30_0.png

As we can see, the network doesn’t detect what we want. That is expected as we have not fine-tuned the network with our custom dataset. We are going to do that in the next steps.

Train on a custom dataset

In this section, we show how to train an existing detectron2 model on a custom dataset in COCO format.

Prepare the dataset

Register the custom dataset to Detectron2, following the detectron2 custom dataset tutorial. Here, the dataset is in COCO format, therefore we register into Detectron2’s standard format. User should write such a function when using a dataset in custom format. See the tutorial for more details.

from detectron2.data.datasets import register_coco_instances
from detectron2.data import get_detection_dataset_dicts
from detectron2.data.datasets import  builtin_meta
register_coco_instances(f"{DATASET_NAME}_train", {}, f"{DATASET_DIR}/train/annotations.json", f"{DATASET_DIR}/train/")
register_coco_instances(f"{DATASET_NAME}_valid", {}, f"{DATASET_DIR}/valid/annotations.json", f"{DATASET_DIR}/valid/")
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipykernel_25857/4182887459.py in <module>
----> 1 register_coco_instances(f"{DATASET_NAME}_train", {}, f"{DATASET_DIR}/train/annotations.json", f"{DATASET_DIR}/train/")
      2 register_coco_instances(f"{DATASET_NAME}_valid", {}, f"{DATASET_DIR}/valid/annotations.json", f"{DATASET_DIR}/valid/")

~/anaconda3/envs/annolid/lib/python3.7/site-packages/detectron2/data/datasets/coco.py in register_coco_instances(name, metadata, json_file, image_root)
    498     assert isinstance(image_root, (str, os.PathLike)), image_root
    499     # 1. register a function which returns dicts
--> 500     DatasetCatalog.register(name, lambda: load_coco_json(json_file, image_root, name))
    501 
    502     # 2. Optionally, add metadata about this dataset,

~/anaconda3/envs/annolid/lib/python3.7/site-packages/detectron2/data/catalog.py in register(self, name, func)
     35         """
     36         assert callable(func), "You must register a function with `DatasetCatalog.register`!"
---> 37         assert name not in self, "Dataset '{}' is already registered!".format(name)
     38         self[name] = func
     39 

AssertionError: Dataset 'sample_dataset_coco_dataset_train' is already registered!
dataset_dicts = get_detection_dataset_dicts([f"{DATASET_NAME}_train"])
[01/13 09:38:48 d2.data.datasets.coco]: Loaded 11 images in COCO format from sample_dataset_coco_dataset/train/annotations.json
[01/13 09:38:48 d2.data.build]: Removed 0 images with no usable annotations. 11 images left.
[01/13 09:38:48 d2.data.build]: Distribution of instances among all 16 categories:
|   category    | #instances   |   category    | #instances   |  category  | #instances   |
|:-------------:|:-------------|:-------------:|:-------------|:----------:|:-------------|
| _background_  | 0            |     mouse     | 9            |    body    | 0            |
| body_centroid | 0            |   left_ear    | 11           | right_ear  | 11           |
|     nose      | 11           |     head      | 0            |    wall    | 0            |
|    corner     | 0            | base_of_tail  | 11           |  grooming  | 0            |
|    rearing    | 0            | object_inve.. | 2            |  tea_ball  | 9            |
|     hand      | 0            |               |              |            |              |
|     total     | 64           |               |              |            |              |
_dataset_metadata = MetadataCatalog.get(f"{DATASET_NAME}_train")
_dataset_metadata.thing_colors = [cc['color'] for cc in builtin_meta.COCO_CATEGORIES]
_dataset_metadata
namespace(name='sample_dataset_coco_dataset_train',
          json_file='sample_dataset_coco_dataset/train/annotations.json',
          image_root='sample_dataset_coco_dataset/train/',
          evaluator_type='coco',
          thing_classes=['_background_',
                         'mouse',
                         'body',
                         'body_centroid',
                         'left_ear',
                         'right_ear',
                         'nose',
                         'head',
                         'wall',
                         'corner',
                         'base_of_tail',
                         'grooming',
                         'rearing',
                         'object_investigation',
                         'tea_ball',
                         'hand'],
          thing_dataset_id_to_contiguous_id={0: 0,
                                             1: 1,
                                             2: 2,
                                             3: 3,
                                             4: 4,
                                             5: 5,
                                             6: 6,
                                             7: 7,
                                             8: 8,
                                             9: 9,
                                             10: 10,
                                             11: 11,
                                             12: 12,
                                             13: 13,
                                             14: 14,
                                             15: 15},
          thing_colors=[[220, 20, 60],
                        [119, 11, 32],
                        [0, 0, 142],
                        [0, 0, 230],
                        [106, 0, 228],
                        [0, 60, 100],
                        [0, 80, 100],
                        [0, 0, 70],
                        [0, 0, 192],
                        [250, 170, 30],
                        [100, 170, 30],
                        [220, 220, 0],
                        [175, 116, 175],
                        [250, 0, 30],
                        [165, 42, 42],
                        [255, 77, 255],
                        [0, 226, 252],
                        [182, 182, 255],
                        [0, 82, 0],
                        [120, 166, 157],
                        [110, 76, 0],
                        [174, 57, 255],
                        [199, 100, 0],
                        [72, 0, 118],
                        [255, 179, 240],
                        [0, 125, 92],
                        [209, 0, 151],
                        [188, 208, 182],
                        [0, 220, 176],
                        [255, 99, 164],
                        [92, 0, 73],
                        [133, 129, 255],
                        [78, 180, 255],
                        [0, 228, 0],
                        [174, 255, 243],
                        [45, 89, 255],
                        [134, 134, 103],
                        [145, 148, 174],
                        [255, 208, 186],
                        [197, 226, 255],
                        [171, 134, 1],
                        [109, 63, 54],
                        [207, 138, 255],
                        [151, 0, 95],
                        [9, 80, 61],
                        [84, 105, 51],
                        [74, 65, 105],
                        [166, 196, 102],
                        [208, 195, 210],
                        [255, 109, 65],
                        [0, 143, 149],
                        [179, 0, 194],
                        [209, 99, 106],
                        [5, 121, 0],
                        [227, 255, 205],
                        [147, 186, 208],
                        [153, 69, 1],
                        [3, 95, 161],
                        [163, 255, 0],
                        [119, 0, 170],
                        [0, 182, 199],
                        [0, 165, 120],
                        [183, 130, 88],
                        [95, 32, 0],
                        [130, 114, 135],
                        [110, 129, 133],
                        [166, 74, 118],
                        [219, 142, 185],
                        [79, 210, 114],
                        [178, 90, 62],
                        [65, 70, 15],
                        [127, 167, 115],
                        [59, 105, 106],
                        [142, 108, 45],
                        [196, 172, 0],
                        [95, 54, 80],
                        [128, 76, 255],
                        [201, 57, 1],
                        [246, 0, 122],
                        [191, 162, 208],
                        [255, 255, 128],
                        [147, 211, 203],
                        [150, 100, 100],
                        [168, 171, 172],
                        [146, 112, 198],
                        [210, 170, 100],
                        [92, 136, 89],
                        [218, 88, 184],
                        [241, 129, 0],
                        [217, 17, 255],
                        [124, 74, 181],
                        [70, 70, 70],
                        [255, 228, 255],
                        [154, 208, 0],
                        [193, 0, 92],
                        [76, 91, 113],
                        [255, 180, 195],
                        [106, 154, 176],
                        [230, 150, 140],
                        [60, 143, 255],
                        [128, 64, 128],
                        [92, 82, 55],
                        [254, 212, 124],
                        [73, 77, 174],
                        [255, 160, 98],
                        [255, 255, 255],
                        [104, 84, 109],
                        [169, 164, 131],
                        [225, 199, 255],
                        [137, 54, 74],
                        [135, 158, 223],
                        [7, 246, 231],
                        [107, 255, 200],
                        [58, 41, 149],
                        [183, 121, 142],
                        [255, 73, 97],
                        [107, 142, 35],
                        [190, 153, 153],
                        [146, 139, 141],
                        [70, 130, 180],
                        [134, 199, 156],
                        [209, 226, 140],
                        [96, 36, 108],
                        [96, 96, 96],
                        [64, 170, 64],
                        [152, 251, 152],
                        [208, 229, 228],
                        [206, 186, 171],
                        [152, 161, 64],
                        [116, 112, 0],
                        [0, 114, 143],
                        [102, 102, 156],
                        [250, 141, 255]])
NUM_CLASSES = len(_dataset_metadata.thing_classes)
print(f"{NUM_CLASSES} Number of classes in the dataset")
16 Number of classes in the dataset

To verify the data loading is correct, let’s visualize the annotations of a randomly selected sample in the training set:

for d in random.sample(dataset_dicts, 2):
    if '\\' in d['file_name']:
        d['file_name'] = d['file_name'].replace('\\','/')
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], metadata=_dataset_metadata, scale=0.5)
    out = visualizer.draw_dataset_dict(d)
    if IN_COLAB:
        cv2_imshow(out.get_image()[:, :, ::-1])
    else:
        plt.imshow(out.get_image()[:, :, ::-1])
        
../_images/Annolid_on_Detectron2_Tutorial_42_0.png

Train!

Now, let’s fine-tune the COCO-pretrained R50-FPN Mask R-CNN model with our custom dataset. It takes ~2 hours to train 3000 iterations on Colab’s K80 GPU, or ~1.5 hours on a P100 GPU.

if GPU:
    !nvidia-smi
from detectron2.engine import DefaultTrainer
cfg = get_cfg()
if GPU:
    pass
else:
    cfg.MODEL.DEVICE='cpu'
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = (f"{DATASET_NAME}_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2 #@param
cfg.DATALOADER.SAMPLER_TRAIN = "RepeatFactorTrainingSampler"
cfg.DATALOADER.REPEAT_THRESHOLD = 0.3
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH =  8 #@param
cfg.SOLVER.BASE_LR = 0.0025 #@param # pick a good LR
cfg.SOLVER.MAX_ITER = 3000 #@param    # 300 iterations seems good enough for 100 frames dataset; you will need to train longer for a practical dataset
cfg.SOLVER.CHECKPOINT_PERIOD = 1000 #@param 
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 #@param   # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = NUM_CLASSES  #  (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
[01/13 09:39:12 d2.engine.defaults]: Model:
GeneralizedRCNN(
  (backbone): FPN(
    (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (top_block): LastLevelMaxPool()
    (bottom_up): ResNet(
      (stem): BasicStem(
        (conv1): Conv2d(
          3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
      )
      (res2): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv1): Conv2d(
            64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv2): Conv2d(
            64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv3): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv2): Conv2d(
            64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv3): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv2): Conv2d(
            64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv3): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
        )
      )
      (res3): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv1): Conv2d(
            256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
        (3): BottleneckBlock(
          (conv1): Conv2d(
            512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
      )
      (res4): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
          (conv1): Conv2d(
            512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (3): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (4): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (5): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
      )
      (res5): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
          (conv1): Conv2d(
            1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv2): Conv2d(
            512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv3): Conv2d(
            512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv2): Conv2d(
            512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv3): Conv2d(
            512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv2): Conv2d(
            512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv3): Conv2d(
            512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
        )
      )
    )
  )
  (proposal_generator): RPN(
    (rpn_head): StandardRPNHead(
      (conv): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
        (activation): ReLU()
      )
      (objectness_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
      (anchor_deltas): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
    )
    (anchor_generator): DefaultAnchorGenerator(
      (cell_anchors): BufferList()
    )
  )
  (roi_heads): StandardROIHeads(
    (box_pooler): ROIPooler(
      (level_poolers): ModuleList(
        (0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=0, aligned=True)
        (1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=0, aligned=True)
        (2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
        (3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, aligned=True)
      )
    )
    (box_head): FastRCNNConvFCHead(
      (flatten): Flatten(start_dim=1, end_dim=-1)
      (fc1): Linear(in_features=12544, out_features=1024, bias=True)
      (fc_relu1): ReLU()
      (fc2): Linear(in_features=1024, out_features=1024, bias=True)
      (fc_relu2): ReLU()
    )
    (box_predictor): FastRCNNOutputLayers(
      (cls_score): Linear(in_features=1024, out_features=17, bias=True)
      (bbox_pred): Linear(in_features=1024, out_features=64, bias=True)
    )
    (mask_pooler): ROIPooler(
      (level_poolers): ModuleList(
        (0): ROIAlign(output_size=(14, 14), spatial_scale=0.25, sampling_ratio=0, aligned=True)
        (1): ROIAlign(output_size=(14, 14), spatial_scale=0.125, sampling_ratio=0, aligned=True)
        (2): ROIAlign(output_size=(14, 14), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
        (3): ROIAlign(output_size=(14, 14), spatial_scale=0.03125, sampling_ratio=0, aligned=True)
      )
    )
    (mask_head): MaskRCNNConvUpsampleHead(
      (mask_fcn1): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
        (activation): ReLU()
      )
      (mask_fcn2): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
        (activation): ReLU()
      )
      (mask_fcn3): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
        (activation): ReLU()
      )
      (mask_fcn4): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
        (activation): ReLU()
      )
      (deconv): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2))
      (deconv_relu): ReLU()
      (predictor): Conv2d(256, 16, kernel_size=(1, 1), stride=(1, 1))
    )
  )
)
[01/13 09:39:12 d2.data.datasets.coco]: Loaded 11 images in COCO format from sample_dataset_coco_dataset/train/annotations.json
[01/13 09:39:12 d2.data.build]: Removed 0 images with no usable annotations. 11 images left.
[01/13 09:39:12 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()]
[01/13 09:39:12 d2.data.build]: Using training sampler RepeatFactorTrainingSampler
[01/13 09:39:12 d2.data.common]: Serializing 11 elements to byte tensors and concatenating them all ...
[01/13 09:39:12 d2.data.common]: Serialized dataset takes 0.03 MiB
WARNING [01/13 09:39:12 d2.solver.build]: SOLVER.STEPS contains values larger than SOLVER.MAX_ITER. These values will be ignored.
Skip loading parameter 'roi_heads.box_predictor.cls_score.weight' to the model due to incompatible shapes: (81, 1024) in the checkpoint but (17, 1024) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.box_predictor.cls_score.bias' to the model due to incompatible shapes: (81,) in the checkpoint but (17,) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.box_predictor.bbox_pred.weight' to the model due to incompatible shapes: (320, 1024) in the checkpoint but (64, 1024) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.box_predictor.bbox_pred.bias' to the model due to incompatible shapes: (320,) in the checkpoint but (64,) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.mask_head.predictor.weight' to the model due to incompatible shapes: (80, 256, 1, 1) in the checkpoint but (16, 256, 1, 1) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.mask_head.predictor.bias' to the model due to incompatible shapes: (80,) in the checkpoint but (16,) in the model! You might want to double check if this is expected.
Some model parameters or buffers are not found in the checkpoint:
roi_heads.box_predictor.bbox_pred.{bias, weight}
roi_heads.box_predictor.cls_score.{bias, weight}
roi_heads.mask_head.predictor.{bias, weight}
# Look at training curves in tensorboard:
%load_ext tensorboard
%tensorboard --logdir output
trainer.train()

Inference & evaluation using the trained model

Now, let’s run inference with the trained model on the validation dataset. First, let’s create a predictor using the model we just trained:

# Inference should use the config with parameters that are used in training
# cfg now already contains everything we've set previously. 
# We simply update the weights with the newly trained ones to perform inference:
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")  # path to the model we just trained
# set a custom testing threshold
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.15   #@param {type: "slider", min:0.0, max:1.0, step: 0.01}
predictor = DefaultPredictor(cfg)

Then, we randomly select several samples to visualize the prediction results.

from detectron2.utils.visualizer import ColorMode
dataset_dicts = get_detection_dataset_dicts([f"{DATASET_NAME}_valid"])
for d in random.sample(dataset_dicts, 4):    
    im = cv2.imread(d["file_name"])
    outputs = predictor(im)  # format is documented at https://detectron2.readthedocs.io/tutorials/models.html#model-output-format
    v = Visualizer(im[:, :, ::-1],
                   metadata=_dataset_metadata, 
                   scale=0.5, 
                   instance_mode=ColorMode.SEGMENTATION   # remove the colors of unsegmented pixels. This option is only available for segmentation models
    )
    out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    if IN_COLAB:
        cv2_imshow(out.get_image()[:, :, ::-1])
    else:
        plt.imshow(out.get_image()[:, :, ::-1])
        plt.show()
        
[01/13 09:39:36 d2.data.datasets.coco]: Loaded 7 images in COCO format from sample_dataset_coco_dataset/valid/annotations.json
[01/13 09:39:36 d2.data.build]: Removed 0 images with no usable annotations. 7 images left.
[01/13 09:39:36 d2.data.build]: Distribution of instances among all 16 categories:
|   category    | #instances   |   category    | #instances   |  category  | #instances   |
|:-------------:|:-------------|:-------------:|:-------------|:----------:|:-------------|
| _background_  | 0            |     mouse     | 7            |    body    | 0            |
| body_centroid | 0            |   left_ear    | 5            | right_ear  | 5            |
|     nose      | 5            |     head      | 0            |    wall    | 0            |
|    corner     | 0            | base_of_tail  | 5            |  grooming  | 0            |
|    rearing    | 0            | object_inve.. | 2            |  tea_ball  | 7            |
|     hand      | 0            |               |              |            |              |
|     total     | 36           |               |              |            |              |
/home/jeremy/anaconda3/envs/annolid/lib/python3.7/site-packages/detectron2/structures/image_list.py:88: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  max_size = (max_size + (stride - 1)) // stride * stride
../_images/Annolid_on_Detectron2_Tutorial_55_2.png ../_images/Annolid_on_Detectron2_Tutorial_55_3.png ../_images/Annolid_on_Detectron2_Tutorial_55_4.png ../_images/Annolid_on_Detectron2_Tutorial_55_5.png

A more robust way to evaluate the model is to use a metric called Average Precision (AP) already implemented in the detectron2 package. If you want more precision on what the AP is, you can take a look here and here.

#TODO: expand on how to interpret AP

from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader
if IN_COLAB:
    evaluator = COCOEvaluator(f"{DATASET_NAME}_valid", cfg, False, output_dir="/content/eval_output/")
else:
    evaluator = COCOEvaluator(f"{DATASET_NAME}_valid", cfg, False, output_dir="eval_output/")

val_loader = build_detection_test_loader(cfg, f"{DATASET_NAME}_valid")
print(inference_on_dataset(predictor.model, val_loader, evaluator))
# another equivalent way to evaluate the model is to use `trainer.test`
WARNING [01/13 12:53:32 d2.evaluation.coco_evaluation]: COCO Evaluator instantiated using config, this is deprecated behavior. Please pass in explicit arguments instead.
[01/13 12:53:33 d2.data.datasets.coco]: Loaded 7 images in COCO format from sample_dataset_coco_dataset/valid/annotations.json
[01/13 12:53:33 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[01/13 12:53:33 d2.data.common]: Serializing 7 elements to byte tensors and concatenating them all ...
[01/13 12:53:33 d2.data.common]: Serialized dataset takes 0.02 MiB
[01/13 12:53:33 d2.evaluation.evaluator]: Start inference on 7 batches
[01/13 12:53:48 d2.evaluation.evaluator]: Total inference time: 0:00:03.334486 (1.667243 s / iter per device, on 1 devices)
[01/13 12:53:48 d2.evaluation.evaluator]: Total inference pure compute time: 0:00:03 (1.638602 s / iter per device, on 1 devices)
[01/13 12:53:48 d2.evaluation.coco_evaluation]: Preparing results for COCO format ...
[01/13 12:53:48 d2.evaluation.coco_evaluation]: Saving results to eval_output/coco_instances_results.json
[01/13 12:53:48 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API...
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
[01/13 12:53:48 d2.evaluation.fast_eval_api]: Evaluate annotation type *bbox*
[01/13 12:53:48 d2.evaluation.fast_eval_api]: COCOeval_opt.evaluate() finished in 0.02 seconds.
[01/13 12:53:48 d2.evaluation.fast_eval_api]: Accumulating evaluation results...
[01/13 12:53:48 d2.evaluation.fast_eval_api]: COCOeval_opt.accumulate() finished in 0.03 seconds.
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.347
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.511
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.392
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.286
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.566
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.202
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.460
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.466
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.466
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.430
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.571
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.400
[01/13 12:53:48 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 34.728 | 51.104 | 39.231 | 28.557 | 56.584 | 20.198 |
[01/13 12:53:48 d2.evaluation.coco_evaluation]: Per-category bbox AP: 
| category      | AP     | category             | AP     | category   | AP     |
|:--------------|:-------|:---------------------|:-------|:-----------|:-------|
| _background_  | nan    | mouse                | 44.211 | body       | nan    |
| body_centroid | nan    | left_ear             | 37.320 | right_ear  | 17.277 |
| nose          | 45.017 | head                 | nan    | wall       | nan    |
| corner        | nan    | base_of_tail         | 14.614 | grooming   | nan    |
| rearing       | nan    | object_investigation | 20.198 | tea_ball   | 64.455 |
| hand          | nan    |                      |        |            |        |
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
[01/13 12:53:48 d2.evaluation.fast_eval_api]: Evaluate annotation type *segm*
[01/13 12:53:48 d2.evaluation.fast_eval_api]: COCOeval_opt.evaluate() finished in 0.00 seconds.
[01/13 12:53:48 d2.evaluation.fast_eval_api]: Accumulating evaluation results...
[01/13 12:53:48 d2.evaluation.fast_eval_api]: COCOeval_opt.accumulate() finished in 0.03 seconds.
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.334
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.541
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.364
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.251
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.598
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.151
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.436
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.439
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.439
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.390
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.607
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.300
[01/13 12:53:48 d2.evaluation.coco_evaluation]: Evaluation results for segm: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 33.410 | 54.098 | 36.351 | 25.052 | 59.805 | 15.149 |
[01/13 12:53:48 d2.evaluation.coco_evaluation]: Per-category segm AP: 
| category      | AP     | category             | AP     | category   | AP     |
|:--------------|:-------|:---------------------|:-------|:-----------|:-------|
| _background_  | nan    | mouse                | 52.871 | body       | nan    |
| body_centroid | nan    | left_ear             | 33.085 | right_ear  | 18.267 |
| nose          | 37.657 | head                 | nan    | wall       | nan    |
| corner        | nan    | base_of_tail         | 11.198 | grooming   | nan    |
| rearing       | nan    | object_investigation | 15.149 | tea_ball   | 65.644 |
| hand          | nan    |                      |        |            |        |
OrderedDict([('bbox', {'AP': 34.72753193686715, 'AP50': 51.10415123144967, 'AP75': 39.23149457802923, 'APs': 28.557009272355806, 'APm': 56.58415841584158, 'APl': 20.198019801980198, 'AP-_background_': nan, 'AP-mouse': 44.21122112211221, 'AP-body': nan, 'AP-body_centroid': nan, 'AP-left_ear': 37.32044633034732, 'AP-right_ear': 17.277227722772277, 'AP-nose': 45.01650165016501, 'AP-head': nan, 'AP-wall': nan, 'AP-corner': nan, 'AP-base_of_tail': 14.613861386138613, 'AP-grooming': nan, 'AP-rearing': nan, 'AP-object_investigation': 20.198019801980198, 'AP-tea_ball': 64.45544554455445, 'AP-hand': nan}), ('segm', {'AP': 33.41007161940684, 'AP50': 54.09802204710267, 'AP75': 36.35077793493635, 'APs': 25.05178374980355, 'APm': 59.805280528052805, 'APl': 15.148514851485148, 'AP-_background_': nan, 'AP-mouse': 52.87128712871287, 'AP-body': nan, 'AP-body_centroid': nan, 'AP-left_ear': 33.08502278799308, 'AP-right_ear': 18.26732673267327, 'AP-nose': 37.65676567656765, 'AP-head': nan, 'AP-wall': nan, 'AP-corner': nan, 'AP-base_of_tail': 11.198019801980196, 'AP-grooming': nan, 'AP-rearing': nan, 'AP-object_investigation': 15.148514851485148, 'AP-tea_ball': 65.64356435643565, 'AP-hand': nan})])

Let’s test our newly trained mode on a new video

We download a video from a URL

#e.g.
#!wget https://hosting-website.com/your-video.mp4

Please change the VIDEO_INPUT to the path of your inference video

VIDEO_INPUT="/content/video60.mkv"
OUTPUT_DIR = "/content/eval_output"
VIDEO_INPUT = '../../sample_dataset/sample_video.mp4'
OUTPUT_DIR = "../../sample_dataset/"
import cv2
video = cv2.VideoCapture(VIDEO_INPUT)
width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
frames_per_second = video.get(cv2.CAP_PROP_FPS)
num_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
basename = os.path.basename(VIDEO_INPUT)
import os 
os.makedirs(OUTPUT_DIR,exist_ok=True)
def _frame_from_video(video):
  attempt = 0
  for i in range(num_frames):
      success, frame = video.read()
      if success:
          yield frame
      else:
          attempt += 1
          if attempt >= 2000:
              break
          else:
              video.set(cv2.CAP_PROP_POS_FRAMES, i+1)
              print('Cannot read this frame:', i)
              continue
import pandas as pd
import pycocotools.mask as mask_util
class_names = _dataset_metadata.thing_classes
print(class_names)
['_background_', 'mouse', 'body', 'body_centroid', 'left_ear', 'right_ear', 'nose', 'head', 'wall', 'corner', 'base_of_tail', 'grooming', 'rearing', 'object_investigation', 'tea_ball', 'hand']
frame_number = 0
tracking_results = []
VIS = True
for frame in _frame_from_video(video): 
    im = frame
    outputs = predictor(im)
    out_dict = {}  
    instances = outputs["instances"].to("cpu")
    num_instance = len(instances)
    if num_instance == 0:
        out_dict['frame_number'] = frame_number
        out_dict['x1'] = None
        out_dict['y1'] = None
        out_dict['x2'] = None
        out_dict['y2'] = None
        out_dict['instance_name'] = None
        out_dict['class_score'] = None
        out_dict['segmentation'] = None
        tracking_results.append(out_dict)
        out_dict = {}
    else:
        boxes = instances.pred_boxes.tensor.numpy()
        boxes = boxes.tolist()
        scores = instances.scores.tolist()
        classes = instances.pred_classes.tolist()

        has_mask = instances.has("pred_masks")

        if has_mask:
            rles =[
                   mask_util.encode(np.array(mask[:,:,None], order="F", dtype="uint8"))[0]
                   for mask in instances.pred_masks
            ]
            for rle in rles:
              rle["counts"] = rle["counts"].decode("utf-8")

        assert len(rles) == len(boxes)
        for k in range(num_instance):
            box = boxes[k]
            out_dict['frame_number'] = frame_number
            out_dict['x1'] = box[0]
            out_dict['y1'] = box[1]
            out_dict['x2'] = box[2]
            out_dict['y2'] = box[3]
            out_dict['instance_name'] = class_names[classes[k]]
            out_dict['class_score'] = scores[k]
            out_dict['segmentation'] = rles[k]
            if frame_number % 1000 == 0:
              print(f"Frame number {frame_number}: {out_dict}")
            tracking_results.append(out_dict)
            out_dict = {}
        
    # format is documented at https://detectron2.readthedocs.io/tutorials/models.html#model-output-format
    if VIS:
        v = Visualizer(im[:, :, ::-1],
                    metadata=_dataset_metadata, 
                    scale=0.5, 
                    instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels. This option is only available for segmentation models
         )
        out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
        out_image = out.get_image()[:, :, ::-1]
        if frame_number % 1000 == 0:
            if IN_COLAB:
                cv2_imshow(out_image)
            else:
                plt.imshow(out_image)
                plt.show()
            #Trun off the visulization to save time after the first frame
            VIS = False
    frame_number += 1
    print(f"Processing frame number {frame_number}")

video.release()
/home/jeremy/anaconda3/envs/annolid/lib/python3.7/site-packages/detectron2/structures/image_list.py:88: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  max_size = (max_size + (stride - 1)) // stride * stride
Frame number 0: {'frame_number': 0, 'x1': 160.23683166503906, 'y1': 190.2261199951172, 'x2': 336.2402648925781, 'y2': 344.6754455566406, 'instance_name': 'mouse', 'class_score': 0.9991045594215393, 'segmentation': {'size': [480, 640], 'counts': '\\c\\21iZ20VTN1O00O10j>4RA0O10O10000O100000001N10000000PU5OQkJ1O0O2O0000001O002N1O1O1O0001O1O1O001O010O11O01OO1O010O1O100O1O01O2O10O000O02O00O2O0O1O01000_BIY<n1A1O0000O11O3M1N10O1000O103L2O0O01001O001OO01O1O005K3M1O001O1O1O3M2N100O1O1O2O0O10O10O0100O01O010004K5L1N1O100O2N;E2N001O1O1O7I2N1O1O1O2N4L2N1O1O1N3N3M8H2N2M2O2M7FPPa4'}}
Frame number 0: {'frame_number': 0, 'x1': 280.5991516113281, 'y1': 324.3965148925781, 'x2': 301.3257141113281, 'y2': 344.28997802734375, 'instance_name': 'right_ear', 'class_score': 0.9980742931365967, 'segmentation': {'size': [480, 640], 'counts': '[QT44k>3L4N1N2N2O1N10001O000001O1N101N3N1M5Kdbn4'}}
Frame number 0: {'frame_number': 0, 'x1': 245.53170776367188, 'y1': 228.6380157470703, 'x2': 268.7048034667969, 'y2': 248.2976531982422, 'instance_name': 'base_of_tail', 'class_score': 0.9973351359367371, 'segmentation': {'size': [480, 640], 'counts': '[ac36h>4M2N2N2O1N101O0000000000000001O1N2N2N3L4LeT^5'}}
Frame number 0: {'frame_number': 0, 'x1': 84.98242950439453, 'y1': 334.3118896484375, 'x2': 184.09133911132812, 'y2': 422.3860778808594, 'instance_name': 'tea_ball', 'class_score': 0.9971858859062195, 'segmentation': {'size': [480, 640], 'counts': 'PfX19e>3J9H8K2N2K6M3M2M3L5N1N2M3N2O0O2M2N3N2N1N200O2N1O2N1O2N100O1O1O1000000O10000000000000000O10000O01O10000O01000O1O1O10000O100O1O101N2N1O2O0O1O2N1O2O1M3O0O2N2N1O2N2N1O2N3M2N2N3L3M3N2N5G7L3N5Ieif6'}}
Frame number 0: {'frame_number': 0, 'x1': 304.2287902832031, 'y1': 303.8758850097656, 'x2': 325.7909851074219, 'y2': 327.376220703125, 'instance_name': 'left_ear', 'class_score': 0.9939261078834534, 'segmentation': {'size': [480, 640], 'counts': 'iX_47f>5M2N2N2N2N1O2O000000000000O101N2O1N3L5LYlb4'}}
Frame number 0: {'frame_number': 0, 'x1': 264.4808654785156, 'y1': 324.1116638183594, 'x2': 285.025146484375, 'y2': 343.5081787109375, 'instance_name': 'right_ear', 'class_score': 0.6853349804878235, 'segmentation': {'size': [480, 640], 'counts': 'Z_m38g>2N2O1N101N1000000001O0O101N2N2MhaV5'}}
Frame number 0: {'frame_number': 0, 'x1': 304.69866943359375, 'y1': 302.03546142578125, 'x2': 327.1500549316406, 'y2': 329.07427978515625, 'instance_name': 'right_ear', 'class_score': 0.25220856070518494, 'segmentation': {'size': [480, 640], 'counts': 'hX_49e>3N2N2M3O1N1O2O0O1000000000O2O1N2O1N2N3L7HX]b4'}}
../_images/Annolid_on_Detectron2_Tutorial_72_2.png
Processing frame number 1
Processing frame number 2
Processing frame number 3
Processing frame number 4
Processing frame number 5
Processing frame number 6
Processing frame number 7
Processing frame number 8
Processing frame number 9
Processing frame number 10
Processing frame number 11
Processing frame number 12
Processing frame number 13
Processing frame number 14
Processing frame number 15
Processing frame number 16
Processing frame number 17
Processing frame number 18
Processing frame number 19
Processing frame number 20
Processing frame number 21
Processing frame number 22
Processing frame number 23
Processing frame number 24
Processing frame number 25
Processing frame number 26
Processing frame number 27
Processing frame number 28
Processing frame number 29
Processing frame number 30
Processing frame number 31
Processing frame number 32
Processing frame number 33
Processing frame number 34
Processing frame number 35
Processing frame number 36
Processing frame number 37
Processing frame number 38
Processing frame number 39
Processing frame number 40
Processing frame number 41
Processing frame number 42
Processing frame number 43
Processing frame number 44
Processing frame number 45
Processing frame number 46
Processing frame number 47
Processing frame number 48
Processing frame number 49
Processing frame number 50
Processing frame number 51
Processing frame number 52
Processing frame number 53
Processing frame number 54
Processing frame number 55
Processing frame number 56
Processing frame number 57
Processing frame number 58
Processing frame number 59
Processing frame number 60
Processing frame number 61
Processing frame number 62
Processing frame number 63
Processing frame number 64
Processing frame number 65
Processing frame number 66
Processing frame number 67
Processing frame number 68
Processing frame number 69
Processing frame number 70
Processing frame number 71
Processing frame number 72
Processing frame number 73
Processing frame number 74
Processing frame number 75
Processing frame number 76
Processing frame number 77
Processing frame number 78
Processing frame number 79
Processing frame number 80
Processing frame number 81
Processing frame number 82
Processing frame number 83
Processing frame number 84
Processing frame number 85
Processing frame number 86
Processing frame number 87
Processing frame number 88
Processing frame number 89
Processing frame number 90
Processing frame number 91
Processing frame number 92
Processing frame number 93
Processing frame number 94
Processing frame number 95
Processing frame number 96
Processing frame number 97
Processing frame number 98
Processing frame number 99
Processing frame number 100
Processing frame number 101
Processing frame number 102
Processing frame number 103
Processing frame number 104
Processing frame number 105
Processing frame number 106
Processing frame number 107
Processing frame number 108
Processing frame number 109
Processing frame number 110
Processing frame number 111
Processing frame number 112
Processing frame number 113
Processing frame number 114
Processing frame number 115
Processing frame number 116
Processing frame number 117
Processing frame number 118
Processing frame number 119
Processing frame number 120
Processing frame number 121
Processing frame number 122
Processing frame number 123
Processing frame number 124
Processing frame number 125
Processing frame number 126
Processing frame number 127
Processing frame number 128
Processing frame number 129
Processing frame number 130
Processing frame number 131
Processing frame number 132
Processing frame number 133
Processing frame number 134
Processing frame number 135
Processing frame number 136
Processing frame number 137
Processing frame number 138
Processing frame number 139
Processing frame number 140
Processing frame number 141
Processing frame number 142
Processing frame number 143
Processing frame number 144
Processing frame number 145
Processing frame number 146
Processing frame number 147
Processing frame number 148
Processing frame number 149
Processing frame number 150
Processing frame number 151
Processing frame number 152
Processing frame number 153
Processing frame number 154
Processing frame number 155
Processing frame number 156
Processing frame number 157
Processing frame number 158
Processing frame number 159
Processing frame number 160
Processing frame number 161
Processing frame number 162
Processing frame number 163
Processing frame number 164
Processing frame number 165
Processing frame number 166
Processing frame number 167
Processing frame number 168
Processing frame number 169
Processing frame number 170
Processing frame number 171
Processing frame number 172
Processing frame number 173
Processing frame number 174
Processing frame number 175
Processing frame number 176
Processing frame number 177
Processing frame number 178
Processing frame number 179
Processing frame number 180
Processing frame number 181
Processing frame number 182
Processing frame number 183
Processing frame number 184
Processing frame number 185
Processing frame number 186
Processing frame number 187
Processing frame number 188
Processing frame number 189
Processing frame number 190
Processing frame number 191
Processing frame number 192
Processing frame number 193
Processing frame number 194
Processing frame number 195
Processing frame number 196
Processing frame number 197
Processing frame number 198
Processing frame number 199
Processing frame number 200
Processing frame number 201
Processing frame number 202
Processing frame number 203
Processing frame number 204
Processing frame number 205
Processing frame number 206
Processing frame number 207
Processing frame number 208
Processing frame number 209
Processing frame number 210
Processing frame number 211
Processing frame number 212
Processing frame number 213
Processing frame number 214
Processing frame number 215
Processing frame number 216
Processing frame number 217
Processing frame number 218
Processing frame number 219
Processing frame number 220
Processing frame number 221
Processing frame number 222
Processing frame number 223
Processing frame number 224
Processing frame number 225
Processing frame number 226
Processing frame number 227
Processing frame number 228
Processing frame number 229
Processing frame number 230
Processing frame number 231
Processing frame number 232
Processing frame number 233
Processing frame number 234
Processing frame number 235
Processing frame number 236
Processing frame number 237
Processing frame number 238
Processing frame number 239
Processing frame number 240
Processing frame number 241
Processing frame number 242
Processing frame number 243
Processing frame number 244
Processing frame number 245
Processing frame number 246
Processing frame number 247
Processing frame number 248
Processing frame number 249
Processing frame number 250
Processing frame number 251
Processing frame number 252
Processing frame number 253
Processing frame number 254
Processing frame number 255
Processing frame number 256
Processing frame number 257
Processing frame number 258
Processing frame number 259
Processing frame number 260
Processing frame number 261
Processing frame number 262
Processing frame number 263
Processing frame number 264
Processing frame number 265
Processing frame number 266
Processing frame number 267
Processing frame number 268
Processing frame number 269
Processing frame number 270
Processing frame number 271
Processing frame number 272
Processing frame number 273
Processing frame number 274
Processing frame number 275
Processing frame number 276
Processing frame number 277
Processing frame number 278
Processing frame number 279
Processing frame number 280
Processing frame number 281
Processing frame number 282
Processing frame number 283
Processing frame number 284
Processing frame number 285
Processing frame number 286
Processing frame number 287
Processing frame number 288
Processing frame number 289
Processing frame number 290
Processing frame number 291
Processing frame number 292
Processing frame number 293
Processing frame number 294
Processing frame number 295
Processing frame number 296
Processing frame number 297
Processing frame number 298
Processing frame number 299
Processing frame number 300
Processing frame number 301
Processing frame number 302
Processing frame number 303
Processing frame number 304
Processing frame number 305
Processing frame number 306
Processing frame number 307
Processing frame number 308
Processing frame number 309
Processing frame number 310
Processing frame number 311
Processing frame number 312
Processing frame number 313
Processing frame number 314
Processing frame number 315
Processing frame number 316
Processing frame number 317
Processing frame number 318
Processing frame number 319
Processing frame number 320
Processing frame number 321
Processing frame number 322
Processing frame number 323
Processing frame number 324
Processing frame number 325
Processing frame number 326
Processing frame number 327
Processing frame number 328
Processing frame number 329
Processing frame number 330
Processing frame number 331
Processing frame number 332
Processing frame number 333
Processing frame number 334
Processing frame number 335
Processing frame number 336
Processing frame number 337
Processing frame number 338
Processing frame number 339
Processing frame number 340
Processing frame number 341
Processing frame number 342
Processing frame number 343
Processing frame number 344
Processing frame number 345
Processing frame number 346
Processing frame number 347
Processing frame number 348
Processing frame number 349
Processing frame number 350
Processing frame number 351
Processing frame number 352
Processing frame number 353
Processing frame number 354
Processing frame number 355
Processing frame number 356
Processing frame number 357
Processing frame number 358
Processing frame number 359
Processing frame number 360
Processing frame number 361
Processing frame number 362
Processing frame number 363
Processing frame number 364
Processing frame number 365
Processing frame number 366
Processing frame number 367
Processing frame number 368
Processing frame number 369
Processing frame number 370
Processing frame number 371
Processing frame number 372
Processing frame number 373
Processing frame number 374
Processing frame number 375
Processing frame number 376
Processing frame number 377
Processing frame number 378
Processing frame number 379
Processing frame number 380
Processing frame number 381
Processing frame number 382
Processing frame number 383
Processing frame number 384
Processing frame number 385
Processing frame number 386
Processing frame number 387
Processing frame number 388
Processing frame number 389
Processing frame number 390
Processing frame number 391
Processing frame number 392
Processing frame number 393
Processing frame number 394
Processing frame number 395
Processing frame number 396
Processing frame number 397
Processing frame number 398
Processing frame number 399
Processing frame number 400
Processing frame number 401
Processing frame number 402
Processing frame number 403
Processing frame number 404
Processing frame number 405
Processing frame number 406
Processing frame number 407
Processing frame number 408
Processing frame number 409
Processing frame number 410
Processing frame number 411
Processing frame number 412
Processing frame number 413
Processing frame number 414
Processing frame number 415
Processing frame number 416
Processing frame number 417
Processing frame number 418
Processing frame number 419
Processing frame number 420
Processing frame number 421
Processing frame number 422
Processing frame number 423
Processing frame number 424
Processing frame number 425
Processing frame number 426
Processing frame number 427
Processing frame number 428
Processing frame number 429
Processing frame number 430
Processing frame number 431
Processing frame number 432
Processing frame number 433
Processing frame number 434
Processing frame number 435
Processing frame number 436
Processing frame number 437
Processing frame number 438
Processing frame number 439
Processing frame number 440
Processing frame number 441
Processing frame number 442
Processing frame number 443
Processing frame number 444
Processing frame number 445
Processing frame number 446
Processing frame number 447
Processing frame number 448
Processing frame number 449
Processing frame number 450
Processing frame number 451
Processing frame number 452
Processing frame number 453
Processing frame number 454
Processing frame number 455
Processing frame number 456
Processing frame number 457
Processing frame number 458
Processing frame number 459
Processing frame number 460
Processing frame number 461
Processing frame number 462
Processing frame number 463
Processing frame number 464
Processing frame number 465
Processing frame number 466
Processing frame number 467
Processing frame number 468
Processing frame number 469
Processing frame number 470
Processing frame number 471
Processing frame number 472
Processing frame number 473
Processing frame number 474
Processing frame number 475
Processing frame number 476
Processing frame number 477
Processing frame number 478
Processing frame number 479
Processing frame number 480
Processing frame number 481
Processing frame number 482
Processing frame number 483
Processing frame number 484
Processing frame number 485
Processing frame number 486
Processing frame number 487
Processing frame number 488
Processing frame number 489
Processing frame number 490
Processing frame number 491
Processing frame number 492
Processing frame number 493
Processing frame number 494
Processing frame number 495
Processing frame number 496
Processing frame number 497
Processing frame number 498
Processing frame number 499
Processing frame number 500
Processing frame number 501
Processing frame number 502
Processing frame number 503
Processing frame number 504
Processing frame number 505
Processing frame number 506
Processing frame number 507
Processing frame number 508
Processing frame number 509
Processing frame number 510
Processing frame number 511
Processing frame number 512
Processing frame number 513
Processing frame number 514
Processing frame number 515
Processing frame number 516
Processing frame number 517
Processing frame number 518
Processing frame number 519
Processing frame number 520
Processing frame number 521
Processing frame number 522
Processing frame number 523
Processing frame number 524
Processing frame number 525
Processing frame number 526
Processing frame number 527
Processing frame number 528
Processing frame number 529
Processing frame number 530
Processing frame number 531
Processing frame number 532
Processing frame number 533
Processing frame number 534
Processing frame number 535
Processing frame number 536
Processing frame number 537
Processing frame number 538
Processing frame number 539
Processing frame number 540
Processing frame number 541
Processing frame number 542
Processing frame number 543
Processing frame number 544
Processing frame number 545
Processing frame number 546
Processing frame number 547
Processing frame number 548
Processing frame number 549
Processing frame number 550
Processing frame number 551
Processing frame number 552
Processing frame number 553
Processing frame number 554
Processing frame number 555
Processing frame number 556
Processing frame number 557
Processing frame number 558
Processing frame number 559
Processing frame number 560
Processing frame number 561
Processing frame number 562
Processing frame number 563
Processing frame number 564
Processing frame number 565
Processing frame number 566
Processing frame number 567
Processing frame number 568
Processing frame number 569
Processing frame number 570
Processing frame number 571
Processing frame number 572
Processing frame number 573
Processing frame number 574
Processing frame number 575
Processing frame number 576
Processing frame number 577
Processing frame number 578
Processing frame number 579
Processing frame number 580
Processing frame number 581
Processing frame number 582
Processing frame number 583
Processing frame number 584
Processing frame number 585
Processing frame number 586
Processing frame number 587
Processing frame number 588
Processing frame number 589
Processing frame number 590
Processing frame number 591
Processing frame number 592
Processing frame number 593
Processing frame number 594
Processing frame number 595
Processing frame number 596
Processing frame number 597
Processing frame number 598
Processing frame number 599
Processing frame number 600
Processing frame number 601
Processing frame number 602
Processing frame number 603
Processing frame number 604
Processing frame number 605
Processing frame number 606
Processing frame number 607
Processing frame number 608
Processing frame number 609
Processing frame number 610
Processing frame number 611
Processing frame number 612
Processing frame number 613
Processing frame number 614
Processing frame number 615
Processing frame number 616
Processing frame number 617
Processing frame number 618
Processing frame number 619
Processing frame number 620
Processing frame number 621
Processing frame number 622
Processing frame number 623
Processing frame number 624
Processing frame number 625
Processing frame number 626
Processing frame number 627
Processing frame number 628
Processing frame number 629
Processing frame number 630
Processing frame number 631
Processing frame number 632
Processing frame number 633
Processing frame number 634
Processing frame number 635
Processing frame number 636
Processing frame number 637
Processing frame number 638
Processing frame number 639
Processing frame number 640
Processing frame number 641
Processing frame number 642
Processing frame number 643
Processing frame number 644
Processing frame number 645
Processing frame number 646
Processing frame number 647
Processing frame number 648
Processing frame number 649
Processing frame number 650
Processing frame number 651
Processing frame number 652
Processing frame number 653
Processing frame number 654
Processing frame number 655
Processing frame number 656
Processing frame number 657
Processing frame number 658
Processing frame number 659
Processing frame number 660
Processing frame number 661
Processing frame number 662
Processing frame number 663
Processing frame number 664
Processing frame number 665
Processing frame number 666
Processing frame number 667
Processing frame number 668
Processing frame number 669
Processing frame number 670
Processing frame number 671
Processing frame number 672
Processing frame number 673
Processing frame number 674
Processing frame number 675
Processing frame number 676
Processing frame number 677
Processing frame number 678
Processing frame number 679
Processing frame number 680
Processing frame number 681
Processing frame number 682
Processing frame number 683
Processing frame number 684
Processing frame number 685
Processing frame number 686
Processing frame number 687
Processing frame number 688
Processing frame number 689
Processing frame number 690
Processing frame number 691
Processing frame number 692
Processing frame number 693
Processing frame number 694
Processing frame number 695
Processing frame number 696
Processing frame number 697
Processing frame number 698
Processing frame number 699
Processing frame number 700
Processing frame number 701
Processing frame number 702
Processing frame number 703
Processing frame number 704
Processing frame number 705
Processing frame number 706
Processing frame number 707
Processing frame number 708
Processing frame number 709
Processing frame number 710
Processing frame number 711
Processing frame number 712
Processing frame number 713
Processing frame number 714
Processing frame number 715
Processing frame number 716
Processing frame number 717
Processing frame number 718
Processing frame number 719
Processing frame number 720
Processing frame number 721
Processing frame number 722
Processing frame number 723
Processing frame number 724
Processing frame number 725
Processing frame number 726
Processing frame number 727
Processing frame number 728
Processing frame number 729
Processing frame number 730
Processing frame number 731
Processing frame number 732
Processing frame number 733
Processing frame number 734
Processing frame number 735
Processing frame number 736
Processing frame number 737
Processing frame number 738
Processing frame number 739
Processing frame number 740
Processing frame number 741
Processing frame number 742
Processing frame number 743
Processing frame number 744
Processing frame number 745
Processing frame number 746
Processing frame number 747
Processing frame number 748
Processing frame number 749
Processing frame number 750
Processing frame number 751
Processing frame number 752
Processing frame number 753
Processing frame number 754
Processing frame number 755
Processing frame number 756
Processing frame number 757
Processing frame number 758
Processing frame number 759
Processing frame number 760
Processing frame number 761
Processing frame number 762
Processing frame number 763
Processing frame number 764
Processing frame number 765
Processing frame number 766
Processing frame number 767
Processing frame number 768
Processing frame number 769
Processing frame number 770
Processing frame number 771
Processing frame number 772
Processing frame number 773
Processing frame number 774
Processing frame number 775
Processing frame number 776
Processing frame number 777
Processing frame number 778
Processing frame number 779
Processing frame number 780
Processing frame number 781
Processing frame number 782
Processing frame number 783
Processing frame number 784
Processing frame number 785
Processing frame number 786
Processing frame number 787
Processing frame number 788
Processing frame number 789
Processing frame number 790
Processing frame number 791
Processing frame number 792
Processing frame number 793
Processing frame number 794
Processing frame number 795
Processing frame number 796
Processing frame number 797
Processing frame number 798
Processing frame number 799
Processing frame number 800
Processing frame number 801
Processing frame number 802
Processing frame number 803
Processing frame number 804
Processing frame number 805
Processing frame number 806
Processing frame number 807
Processing frame number 808
Processing frame number 809
Processing frame number 810
Processing frame number 811
Processing frame number 812
Processing frame number 813
Processing frame number 814
Processing frame number 815
Processing frame number 816
Processing frame number 817
Processing frame number 818
Processing frame number 819
Processing frame number 820
Processing frame number 821
Processing frame number 822
Processing frame number 823
Processing frame number 824
Processing frame number 825
Processing frame number 826
Processing frame number 827
Processing frame number 828
Processing frame number 829
Processing frame number 830
Processing frame number 831
Processing frame number 832
Processing frame number 833
Processing frame number 834
Processing frame number 835
Processing frame number 836
Processing frame number 837
Processing frame number 838
Processing frame number 839
Processing frame number 840
Processing frame number 841
Processing frame number 842
Processing frame number 843
Processing frame number 844
Processing frame number 845
Processing frame number 846
Processing frame number 847
Processing frame number 848
Processing frame number 849
Processing frame number 850
Processing frame number 851
Processing frame number 852
Processing frame number 853
Processing frame number 854
Processing frame number 855
Processing frame number 856
Processing frame number 857
Processing frame number 858
Processing frame number 859
Processing frame number 860
Processing frame number 861
Processing frame number 862
Processing frame number 863
Processing frame number 864
Processing frame number 865
Processing frame number 866
Processing frame number 867
Processing frame number 868
Processing frame number 869
Processing frame number 870
Processing frame number 871
Processing frame number 872
Processing frame number 873
Processing frame number 874
Processing frame number 875
Processing frame number 876
Processing frame number 877
Processing frame number 878
Processing frame number 879
Processing frame number 880
Processing frame number 881
Processing frame number 882
Processing frame number 883
Processing frame number 884
Processing frame number 885
Processing frame number 886
Processing frame number 887
Processing frame number 888
Processing frame number 889
Processing frame number 890
Processing frame number 891
Processing frame number 892
Processing frame number 893
Processing frame number 894
Processing frame number 895
Processing frame number 896
Processing frame number 897
Processing frame number 898
Processing frame number 899
Processing frame number 900
Processing frame number 901
Processing frame number 902
Processing frame number 903
Processing frame number 904
Processing frame number 905
Processing frame number 906
Processing frame number 907
Processing frame number 908
Processing frame number 909
Processing frame number 910
Processing frame number 911
Processing frame number 912
Processing frame number 913
Processing frame number 914
Processing frame number 915
Processing frame number 916
Processing frame number 917
Processing frame number 918
Processing frame number 919
Processing frame number 920
Processing frame number 921
Processing frame number 922
Processing frame number 923
Processing frame number 924
Processing frame number 925
Processing frame number 926
Processing frame number 927
Processing frame number 928
Processing frame number 929
Processing frame number 930
Processing frame number 931
Processing frame number 932
Processing frame number 933
Processing frame number 934
Processing frame number 935
Processing frame number 936
Processing frame number 937
Processing frame number 938
Processing frame number 939
Processing frame number 940
Processing frame number 941
Processing frame number 942
Processing frame number 943
Processing frame number 944
Processing frame number 945
Processing frame number 946
Processing frame number 947
Processing frame number 948
Processing frame number 949
Processing frame number 950
Processing frame number 951
Processing frame number 952
Processing frame number 953
Processing frame number 954
Processing frame number 955
Processing frame number 956
Processing frame number 957
Processing frame number 958
Processing frame number 959
Processing frame number 960
Processing frame number 961
Processing frame number 962
Processing frame number 963
Processing frame number 964
Processing frame number 965
Processing frame number 966
Processing frame number 967
Processing frame number 968
Processing frame number 969
Processing frame number 970
Processing frame number 971
Processing frame number 972
Processing frame number 973
Processing frame number 974
Processing frame number 975
Processing frame number 976
Processing frame number 977
Processing frame number 978
Processing frame number 979
Processing frame number 980
Processing frame number 981
Processing frame number 982
Processing frame number 983
Processing frame number 984
Processing frame number 985
Processing frame number 986
Processing frame number 987
Processing frame number 988
Processing frame number 989
Processing frame number 990
Processing frame number 991
Processing frame number 992
Processing frame number 993
Processing frame number 994
Processing frame number 995
Processing frame number 996
Processing frame number 997
Processing frame number 998
Processing frame number 999
Processing frame number 1000
Frame number 1000: {'frame_number': 1000, 'x1': 85.10054779052734, 'y1': 334.2020263671875, 'x2': 181.1793212890625, 'y2': 424.4140625, 'instance_name': 'tea_ball', 'class_score': 0.9970707893371582, 'segmentation': {'size': [480, 640], 'counts': 'nVX1=`>6L2K8H7L3L3K6M1N3M3N2O1N2M3N2O0N2O101N2N2N1O1O1O100O1O2N10001O0O2N1O10000O10000000000000O100O0100O0100000O10O1O1O1000000O1N201N1O1O2N2O0O1O1O2O0O1O2N2M200O2M2O1O2N2N2N2N3K4N2N6F8K3N3KkVh6'}}
Frame number 1000: {'frame_number': 1000, 'x1': 420.5732727050781, 'y1': 236.66554260253906, 'x2': 441.3321838378906, 'y2': 256.917236328125, 'instance_name': 'nose', 'class_score': 0.9965051412582397, 'segmentation': {'size': [480, 640], 'counts': 'dbU64j>4N2M2O1N2O1N10001OO10000O2N2N1O3L_om2'}}
Frame number 1000: {'frame_number': 1000, 'x1': 458.06500244140625, 'y1': 129.91883850097656, 'x2': 480.6678771972656, 'y2': 152.92935180664062, 'instance_name': 'base_of_tail', 'class_score': 0.9964906573295593, 'segmentation': {'size': [480, 640], 'counts': '\\Yg64i>6K3N2N2N2O0O10001O00O101O001O1N2O1N2N2N3Lf\\Z2'}}
Frame number 1000: {'frame_number': 1000, 'x1': 358.70538330078125, 'y1': 105.5149917602539, 'x2': 512.0758056640625, 'y2': 245.71401977539062, 'instance_name': 'mouse', 'class_score': 0.9955850839614868, 'segmentation': {'size': [480, 640], 'counts': 'j\\X54j>5L2O0O1O1O100000O1N2O1O1O010N2O100O010O01OO2O1O10000O10d^a0>m`^O2N1O1N2aL[OXHP1h7QOUHR1i7oNVHR1j7nNUHS1k7mNSHU1l7mNjG\\1V8dNdGb1\\8^NcGc1]8]NbGd1^8\\N`Gf1`8ZN]Gi1c8WNlFZ2T9gMjFZ2V9fMjFZ2V9fMiF[2W9fMgF[2Y9fM`F`2`9aM]Fa2c9`M[Fa2e9_M[Fa2e9`MYFa2g9`MUFb2l9`MnEd2S:\\MkEe2U:[MjEf2V:[MiEe2W:\\MgEd2[:P10O2N2O1O0O2O001O0O3N1N3N1N2N3M3N2M2N1O2N2M3M3M3N1O2N3J5L3N2O2N1N2L4L5O0O2N1N3I7N1O2N3M4C]OVOgCe0V<BjC;X<CkC:`=LRen1'}}
Frame number 1000: {'frame_number': 1000, 'x1': 449.642333984375, 'y1': 222.0953826904297, 'x2': 471.2927551269531, 'y2': 243.80459594726562, 'instance_name': 'left_ear', 'class_score': 0.977412760257721, 'segmentation': {'size': [480, 640], 'counts': 'XUc66h>4L3N1O2O0O2N100O1000000O0100O2O0O1O2N1OQP_2'}}
Frame number 1000: {'frame_number': 1000, 'x1': 429.0087585449219, 'y1': 208.08079528808594, 'x2': 449.2860412597656, 'y2': 229.2113037109375, 'instance_name': 'left_ear', 'class_score': 0.9381037354469299, 'segmentation': {'size': [480, 640], 'counts': 'hYY66h>4M2M3N2O1O0O2O0000000000O101N1O2M4M2M\\Zi2'}}
Frame number 1000: {'frame_number': 1000, 'x1': 427.6015319824219, 'y1': 208.6620635986328, 'x2': 450.36767578125, 'y2': 229.3686065673828, 'instance_name': 'right_ear', 'class_score': 0.577733039855957, 'segmentation': {'size': [480, 640], 'counts': 'jjX62k>6L2M3N3M2O0O2O000000000000O101N1O2N2M4L\\Zi2'}}
Frame number 1000: {'frame_number': 1000, 'x1': 448.73748779296875, 'y1': 224.34693908691406, 'x2': 472.3518981933594, 'y2': 244.3726348876953, 'instance_name': 'right_ear', 'class_score': 0.3736853301525116, 'segmentation': {'size': [480, 640], 'counts': 'WUc67g>4M2N2N101N1O100000000000O100O100O1O100O2MQa^2'}}
Processing frame number 1001
Processing frame number 1002
Processing frame number 1003
Processing frame number 1004
Processing frame number 1005
Processing frame number 1006
Processing frame number 1007
Processing frame number 1008
Processing frame number 1009
Processing frame number 1010
Processing frame number 1011
Processing frame number 1012
Processing frame number 1013
Processing frame number 1014
Processing frame number 1015
Processing frame number 1016
Processing frame number 1017
Processing frame number 1018
Processing frame number 1019
Processing frame number 1020
Processing frame number 1021
Processing frame number 1022
Processing frame number 1023
Processing frame number 1024
Processing frame number 1025
Processing frame number 1026
Processing frame number 1027
Processing frame number 1028
Processing frame number 1029
Processing frame number 1030
Processing frame number 1031
Processing frame number 1032
Processing frame number 1033
Processing frame number 1034
Processing frame number 1035
Processing frame number 1036
Processing frame number 1037
Processing frame number 1038
Processing frame number 1039
Processing frame number 1040
Processing frame number 1041
Processing frame number 1042
Processing frame number 1043
Processing frame number 1044
Processing frame number 1045
Processing frame number 1046
Processing frame number 1047
Processing frame number 1048
Processing frame number 1049
Processing frame number 1050
Processing frame number 1051
Processing frame number 1052
Processing frame number 1053
Processing frame number 1054
Processing frame number 1055
Processing frame number 1056
Processing frame number 1057
Processing frame number 1058
Processing frame number 1059
Processing frame number 1060
Processing frame number 1061
Processing frame number 1062
Processing frame number 1063
Processing frame number 1064
Processing frame number 1065
Processing frame number 1066
Processing frame number 1067
Processing frame number 1068
Processing frame number 1069
Processing frame number 1070
Processing frame number 1071
Processing frame number 1072
Processing frame number 1073
Processing frame number 1074
Processing frame number 1075
Processing frame number 1076
Processing frame number 1077
Processing frame number 1078
Processing frame number 1079
Processing frame number 1080
Processing frame number 1081
Processing frame number 1082
Processing frame number 1083
Processing frame number 1084
Processing frame number 1085
Processing frame number 1086
Processing frame number 1087
Processing frame number 1088
Processing frame number 1089
Processing frame number 1090
Processing frame number 1091
Processing frame number 1092
Processing frame number 1093
Processing frame number 1094
Processing frame number 1095
Processing frame number 1096
Processing frame number 1097
Processing frame number 1098
Processing frame number 1099
Processing frame number 1100
Processing frame number 1101
Processing frame number 1102
Processing frame number 1103
Processing frame number 1104
Processing frame number 1105
Processing frame number 1106
Processing frame number 1107
Processing frame number 1108
Processing frame number 1109
Processing frame number 1110
Processing frame number 1111
Processing frame number 1112
Processing frame number 1113
Processing frame number 1114
Processing frame number 1115
Processing frame number 1116
Processing frame number 1117
Processing frame number 1118
Processing frame number 1119
Processing frame number 1120
Processing frame number 1121
Processing frame number 1122
Processing frame number 1123
Processing frame number 1124
Processing frame number 1125
Processing frame number 1126
Processing frame number 1127
Processing frame number 1128
Processing frame number 1129
Processing frame number 1130
Processing frame number 1131
Processing frame number 1132
Processing frame number 1133
Processing frame number 1134
Processing frame number 1135
Processing frame number 1136
Processing frame number 1137
Processing frame number 1138
Processing frame number 1139
Processing frame number 1140
Processing frame number 1141
Processing frame number 1142
Processing frame number 1143
Processing frame number 1144
Processing frame number 1145
Processing frame number 1146
Processing frame number 1147
Processing frame number 1148
Processing frame number 1149
Processing frame number 1150
Processing frame number 1151
Processing frame number 1152
Processing frame number 1153
Processing frame number 1154
Processing frame number 1155
Processing frame number 1156
Processing frame number 1157
Processing frame number 1158
Processing frame number 1159
Processing frame number 1160
Processing frame number 1161
Processing frame number 1162
Processing frame number 1163
Processing frame number 1164
Processing frame number 1165
Processing frame number 1166
Processing frame number 1167
Processing frame number 1168
Processing frame number 1169
Processing frame number 1170
Processing frame number 1171
Processing frame number 1172
Processing frame number 1173
Processing frame number 1174
Processing frame number 1175
Processing frame number 1176
Processing frame number 1177
Processing frame number 1178
Processing frame number 1179
Processing frame number 1180
Processing frame number 1181
Processing frame number 1182
Processing frame number 1183
Processing frame number 1184
Processing frame number 1185
Processing frame number 1186
Processing frame number 1187
Processing frame number 1188
Processing frame number 1189
Processing frame number 1190
Processing frame number 1191
Processing frame number 1192
Processing frame number 1193
Processing frame number 1194
Processing frame number 1195
Processing frame number 1196
Processing frame number 1197
Processing frame number 1198
Processing frame number 1199
Processing frame number 1200
Processing frame number 1201
Processing frame number 1202
Processing frame number 1203
Processing frame number 1204
Processing frame number 1205
Processing frame number 1206
Processing frame number 1207
Processing frame number 1208
Processing frame number 1209
Processing frame number 1210
Processing frame number 1211
Processing frame number 1212
Processing frame number 1213
Processing frame number 1214
Processing frame number 1215
Processing frame number 1216
Processing frame number 1217
Processing frame number 1218
Processing frame number 1219
Processing frame number 1220
Processing frame number 1221
Processing frame number 1222
Processing frame number 1223
Processing frame number 1224
Processing frame number 1225
Processing frame number 1226
Processing frame number 1227
Processing frame number 1228
Processing frame number 1229
Processing frame number 1230
Processing frame number 1231
Processing frame number 1232
Processing frame number 1233
Processing frame number 1234
Processing frame number 1235
Processing frame number 1236
Processing frame number 1237
Processing frame number 1238
Processing frame number 1239
Processing frame number 1240
Processing frame number 1241
Processing frame number 1242
Processing frame number 1243
Processing frame number 1244
Processing frame number 1245
Processing frame number 1246
Processing frame number 1247
Processing frame number 1248
Processing frame number 1249
Processing frame number 1250
Processing frame number 1251
Processing frame number 1252
Processing frame number 1253
Processing frame number 1254
Processing frame number 1255
Processing frame number 1256
Processing frame number 1257
Processing frame number 1258
Processing frame number 1259
Processing frame number 1260
Processing frame number 1261
Processing frame number 1262
Processing frame number 1263
Processing frame number 1264
Processing frame number 1265
Processing frame number 1266
Processing frame number 1267
Processing frame number 1268
Processing frame number 1269
Processing frame number 1270
Processing frame number 1271
Processing frame number 1272
Processing frame number 1273
Processing frame number 1274
Processing frame number 1275
Processing frame number 1276
Processing frame number 1277
Processing frame number 1278
Processing frame number 1279
Processing frame number 1280
Processing frame number 1281
Processing frame number 1282
Processing frame number 1283
Processing frame number 1284
Processing frame number 1285
Processing frame number 1286
Processing frame number 1287
Processing frame number 1288
Processing frame number 1289
Processing frame number 1290
Processing frame number 1291
Processing frame number 1292
Processing frame number 1293
Processing frame number 1294
Processing frame number 1295
Processing frame number 1296
Processing frame number 1297
Processing frame number 1298
Processing frame number 1299
Processing frame number 1300
Processing frame number 1301
Processing frame number 1302
Processing frame number 1303
Processing frame number 1304
Processing frame number 1305
Processing frame number 1306
Processing frame number 1307
Processing frame number 1308
Processing frame number 1309

All the tracking results will be saved to this Pandas dataframe.

df = pd.DataFrame(tracking_results)
df.head()
frame_number x1 y1 x2 y2 instance_name class_score segmentation
0 0 160.236832 190.226120 336.240265 344.675446 mouse 0.999105 {'size': [480, 640], 'counts': '\c\21iZ20VTN1O...
1 0 280.599152 324.396515 301.325714 344.289978 right_ear 0.998074 {'size': [480, 640], 'counts': '[QT44k>3L4N1N2...
2 0 245.531708 228.638016 268.704803 248.297653 base_of_tail 0.997335 {'size': [480, 640], 'counts': '[ac36h>4M2N2N2...
3 0 84.982430 334.311890 184.091339 422.386078 tea_ball 0.997186 {'size': [480, 640], 'counts': 'PfX19e>3J9H8K2...
4 0 304.228790 303.875885 325.790985 327.376221 left_ear 0.993926 {'size': [480, 640], 'counts': 'iX_47f>5M2N2N2...

Calculate the bbox center point x, y locations

cx = (df.x1 + df.x2)/2
cy = (df.y1 + df.y2)/2
df['cx'] = cx
df['cy'] = cy
df.head()
frame_number x1 y1 x2 y2 instance_name class_score segmentation cx cy
0 0 160.236832 190.226120 336.240265 344.675446 mouse 0.999105 {'size': [480, 640], 'counts': '\c\21iZ20VTN1O... 248.238548 267.450783
1 0 280.599152 324.396515 301.325714 344.289978 right_ear 0.998074 {'size': [480, 640], 'counts': '[QT44k>3L4N1N2... 290.962433 334.343246
2 0 245.531708 228.638016 268.704803 248.297653 base_of_tail 0.997335 {'size': [480, 640], 'counts': '[ac36h>4M2N2N2... 257.118256 238.467834
3 0 84.982430 334.311890 184.091339 422.386078 tea_ball 0.997186 {'size': [480, 640], 'counts': 'PfX19e>3J9H8K2... 134.536884 378.348984
4 0 304.228790 303.875885 325.790985 327.376221 left_ear 0.993926 {'size': [480, 640], 'counts': 'iX_47f>5M2N2N2... 315.009888 315.626053

Only save the top 1 prediction for each frame for each class

Note: You can change the number to save top n predictions for each frame and an instance name. head(2), head(5), or head(n) To save all the predictions, please use df.to_csv('my_tracking_results.csv').

df_top = df.groupby(['frame_number','instance_name'],sort=False).head(1)
df_top.head()
frame_number x1 y1 x2 y2 instance_name class_score segmentation cx cy
0 0 160.236832 190.226120 336.240265 344.675446 mouse 0.999105 {'size': [480, 640], 'counts': '\c\21iZ20VTN1O... 248.238548 267.450783
1 0 280.599152 324.396515 301.325714 344.289978 right_ear 0.998074 {'size': [480, 640], 'counts': '[QT44k>3L4N1N2... 290.962433 334.343246
2 0 245.531708 228.638016 268.704803 248.297653 base_of_tail 0.997335 {'size': [480, 640], 'counts': '[ac36h>4M2N2N2... 257.118256 238.467834
3 0 84.982430 334.311890 184.091339 422.386078 tea_ball 0.997186 {'size': [480, 640], 'counts': 'PfX19e>3J9H8K2... 134.536884 378.348984
4 0 304.228790 303.875885 325.790985 327.376221 left_ear 0.993926 {'size': [480, 640], 'counts': 'iX_47f>5M2N2N2... 315.009888 315.626053

Visualize the center points with plotly scatter plot

df_vis = df_top[df_top.instance_name != 'Text'][['frame_number','cx','cy','instance_name']]
import plotly.express as px
import plotly.graph_objects as go
import numpy as np

fig = px.scatter(df_vis, 
                 x="cx",
                 y="cy", 
                 color="instance_name",
                 hover_data=['frame_number','cx','cy'])
fig.show()
from pathlib import  Path
tracking_results_csv = f"{Path(dataset).stem}_{Path(VIDEO_INPUT).stem}_{cfg.SOLVER.MAX_ITER}_iters_mask_rcnn_tracking_results_with_segmenation.csv"
df_top.to_csv(tracking_results_csv)

Download the tracking result CSV file to your local device

if IN_COLAB:
    from google.colab import files
    files.download(tracking_results_csv)

The following sections are optional.

Caculate the distance of a pair of instances in a given frame

def paired_distance(frame_number,
                    this_instance='frog_m_2',
                    other_instance='frog_f_2'):
    df_dis = df_top[df_top["frame_number"]==frame_number][['cx','cy','instance_name']]
    df_this = df_dis[df_dis.instance_name == this_instance]
    df_other = df_dis[df_dis.instance_name == other_instance]
    try:
      dist = np.linalg.norm(df_this[['cx','cy']].values-df_other[['cx','cy']].values)
    except:
      dist = None


    return dist

Calculate the distance of the instance from the current and previous frame

def instance_distance_between_frame(frame_number,
                                    instance_name='frog_m_1'):
    if frame_number < 1:
      return 0
    previous_frame_number = frame_number - 1
    df_dis = df_top[df_top["frame_number"]==frame_number][['cx','cy','instance_name']]
    df_dis_prev = df_top[df_top["frame_number"]==previous_frame_number][['cx','cy','instance_name']]
    df_dis = df_dis[df_dis.instance_name == instance_name]
    df_dis_prev = df_dis_prev[df_dis_prev.instance_name == instance_name]

    try:
      dist = np.linalg.norm(df_dis[['cx','cy']].values-df_dis_prev[['cx','cy']].values)
    except:
      dist = None
    
    return dist
    
df_top['dist_from_previous_frame_frog_m_1'] = df_top.frame_number.apply(instance_distance_between_frame)
/home/jeremy/anaconda3/envs/annolid/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

The total distance traveled for frog male in Tank 1 in pixels

df_top['dist_from_previous_frame_frog_m_1'].sum()
0.0
fig = px.line(x=df_top.frame_number, y=df_top.dist_from_previous_frame_frog_m_1, labels={'x':'frame_number', 'y':'distance from previous frame frog_m_1'})
fig.show()

Download and save the results to your local device

Please change the desired CSV file name

tracking_results_with_area_perimeter_csv = f"{tracking_results_csv.replace('.csv','_final.csv')}"
df_top.to_csv(tracking_results_with_area_perimeter_csv)
files.download(tracking_results_with_area_perimeter_csv)

Distance between frog male in tank 2 and frog female in tank 2 in pixels

df_top['dist_frog_m2_f2'] = df_top.frame_number.apply(paired_distance)
fig = px.line(x=df_top.frame_number, y=df_top.dist_frog_m2_f2, labels={'x':'frame_number', 'y':'distance between frog male in tank 2 to frog female in tank 2'})
fig.show()

Save and download the trained model weights

final_model_file = os.path.join(cfg.OUTPUT_DIR,'model_final.pth')
files.download(final_model_file)